Introduce 'markedForEviction' state for the Item.#183
Introduce 'markedForEviction' state for the Item.#183igchor wants to merge 1 commit intofacebook:mainfrom
Conversation
|
@jiayuebao has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
ba7cbd6 to
bd92995
Compare
|
@igchor has updated the pull request. You must reimport the pull request before landing. |
|
@haowu14 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
| try { | ||
| return ref_.incRef(); | ||
| } catch (exception::RefcountOverflow& e) { | ||
| throw exception::RefcountOverflow( |
There was a problem hiding this comment.
We have an extra layer of try-catch here. I think the reason is that we want to use the return value of ref_.incRef for the predicate test instead of just exception.
@therealgymmy Is it okay to regress a bit with refcount over/underflow?
There was a problem hiding this comment.
The reason I've added this rethrow is to keep the abstraction but if this does affect performance perhaps we could just build the message on RefCount.h level?
| * currently in the process of being evicted. | ||
| * | ||
| * An item can only be marked exclusive when `isInMMContainer` returns true. | ||
| * An item can only be marked exclusive when `isInMMContainer` returns true |
There was a problem hiding this comment.
I've been thinking how we can make the docs better about these state transitions as now they are no longer just marking a bit. I suggest we do it in the following ways.
Taking markForEviction for as an example:
- What does the state mean. (An item is "for eviction" indicates the item is ready to be evicted.)
- When can an item enter this state. (An item can be marked for eviction when it has no access refcount and is both accessible and in MMContainer)
- What does it mean when an item is in the state. (An item is markedForEviction if its exclusive bit is set and accessRef is 0)
- What does leaving the state mean? (When unmakeForEviction, the exclusive bit of the item is unset. And does not depend on anything else).
Let's make sure that we touch these 4 points in both markEviction and markMoving.
There was a problem hiding this comment.
I've rewritten the comments.
| * | ||
| * An item can only be marked exclusive when `isInMMContainer` returns true. | ||
| * An item can only be marked exclusive when `isInMMContainer` returns true | ||
| * and item is not already exclusive nor moving and the ref count is 0. |
There was a problem hiding this comment.
Let's just say it "does not have exclusive bit set and access ref count is 0", which implies it is not moving.
|
|
||
| /** This function attempts to mark item for eviction. | ||
| * Can only be called on the item that is moving.*/ | ||
| bool markForEvictionWhenMoving() { |
There was a problem hiding this comment.
It looks like an item can be linked (inMMContainer) but not accessible and be marked as moving. Then enter forEviction via this function.
Is this a problem? Should we check isLinked in this function as well?
There was a problem hiding this comment.
Actually it does not matter whether the item isLinked at this point or not. However, I actually realised that markForEviction does not need to depend on isAccessible - this was just an optimization I implemented some time ago but it does not seem to matter anymore - I removed that check.
There was a problem hiding this comment.
I'm thinking we should do it the other way round: both makrForEviction and markMoving should check isAccessible.
Today, moving need the item to be in access container to succeed. (see moveRegularItem in CacheAllocator-inl.h).
There was a problem hiding this comment.
The problem is that we cannot require isAccessible for markMoving because markMoving is used also for SlabRelease and it should work for items not in the Access Container as well.
I can restore the check for markForEviction but as I said it's only an optimization. It's true that we require the item to be accessible in moveRegularItem but item can become inaccessible anytime after being marked as moving anyway (for example by concurrent insertOrReplace).
There was a problem hiding this comment.
You are right. We don't require accessible today in markExcluive (Basically mark moving).
therealgymmy
left a comment
There was a problem hiding this comment.
@igchor: can you help me understand why we need this change?
Today the goal of preventing anyone else from accessing an item while it is being prepared for eviction is to remove it from accessContainer. Is this insufficient?
My general impression reading the code is that we're changing the responsibility of managing an Item's state (and the right to access an item) to be exclusively within CacheItem logic itself. I don't understand why we're making this change. The change itself also doesn't seem to have reduced the complexity of eviction/moving management.
| * The following four functions are used to track whether or not | ||
| * an item is currently in the process of being moved. This happens during a | ||
| * slab rebalance or resize operation or during eviction. | ||
| * The following two functions correspond to whether or not an item is |
There was a problem hiding this comment.
Marking an item for eviction will prevent anyone from incrementing refcounts.
Can we achieve the same by removing this item from mmContainer and accessContainer?
There was a problem hiding this comment.
Yes, however, as I mentioned in the previous comment removing from accessContainer results in much higher contention (if done under mmContainer lock).
| * returns false and leaves refCount_ unmodified. | ||
| */ | ||
| template <typename P, typename F> | ||
| bool atomicUpdateValue(P&& predicate, F&& newValueF) { |
There was a problem hiding this comment.
nice. Thanks for abstracting this out.
bd92995 to
88309d4
Compare
|
@igchor has updated the pull request. You must reimport the pull request before landing. |
|
@haowu14 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
|
||
| /** This function attempts to mark item for eviction. | ||
| * Can only be called on the item that is moving.*/ | ||
| bool markForEvictionWhenMoving() { |
There was a problem hiding this comment.
I'm thinking we should do it the other way round: both makrForEviction and markMoving should check isAccessible.
Today, moving need the item to be in access container to succeed. (see moveRegularItem in CacheAllocator-inl.h).
88309d4 to
b6e7301
Compare
|
@igchor has updated the pull request. You must reimport the pull request before landing. |
haowu14
left a comment
There was a problem hiding this comment.
Sending back to you for questions. Maybe I miss something.
Thanks for addressing my previous comments.
| // item is being evicted | ||
| return WriteHandle{}; | ||
| } | ||
| return WriteHandle{it, *this}; |
|
|
||
| /** This function attempts to mark item for eviction. | ||
| * Can only be called on the item that is moving.*/ | ||
| bool markForEvictionWhenMoving() { |
There was a problem hiding this comment.
You are right. We don't require accessible today in markExcluive (Basically mark moving).
b6e7301 to
2460d1c
Compare
|
@igchor has updated the pull request. You must reimport the pull request before landing. |
|
@haowu14 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
1 similar comment
|
@haowu14 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
therealgymmy
left a comment
There was a problem hiding this comment.
@igchor: I made another pass. LGTM overall. Comments inline.
| * An item can only be marked moving when `isInMMContainer` returns true | ||
| * and does not have `kExclusive` bit set. |
There was a problem hiding this comment.
Will we still need to maintain isInMMContainer == true in later changes? I'm wondering if all threads can synchronize operations strictly on the item's refcount & flags.
There was a problem hiding this comment.
As I mentioned above, I belive we still need this for SlabRelease but I'll I think about it.
cachelib/allocator/Refcount.h
Outdated
| Value exclusiveBitMask = getAdminRef<kExclusive>(); | ||
|
|
||
| auto predicate = [linkedBitMask, exclusiveBitMask](const Value curValue) { | ||
| const bool linked = curValue & linkedBitMask; |
There was a problem hiding this comment.
nitpick: const bool unlinked = !(curValue & linkedBitMask);
easier to read if (unlinked || alreadyExclusive)
cachelib/allocator/Refcount.h
Outdated
| return (raw & getAdminRef<kExclusive>()) && ((raw & kAccessRefMask) != 0); | ||
| } | ||
|
|
||
| /** This function attempts to mark item for eviction. |
There was a problem hiding this comment.
Can you also add comments to explain that markForEvictionWhenMoving can only succeed when this item isOnlyMoving == true.
There was a problem hiding this comment.
Separately, @igchor: I am wondering if the changes in this PR are introducing a form of "upgrade" and "write" locks to our cache item.
The analogy I have in mind:
- Multiple readers can access an item. (No exclusive bit set. IncRef() act as the way to getting hold of the read-lock, which is released when ItemHandle goes out of scope)
- Multiple readers and one upgrade can co-exist. (Exclusive bit set. Refcount > 1)
- Only one upgrade lock can be held at a time. (First thread setting exclusive bit wins)
- Upgrade lock can be upgraded to a write lock but only without any concurrent readers. (Exclusive bit set. Refcount == 1 -> Refcount == 0)
- Write lock can be held without an upgrade lock in place as long as no concurrent readers.
- Readers are blocked when write lock is held. (Refcount fails when exclusive bit is set)
Once I start thinking in terms of read/upgrade/write, I'm starting to appreciate the simplicity brought by this PR much more.
There was a problem hiding this comment.
Yes, that's a good analogy. The logic changes slightly once we introduce WaitContext for moving items (in that case, 'readers' wait for 'upgrade' to finish it's work), but the reasoning for 'upgrading' still holds.
We even had an idea of using the 'moving' bit and Wait Context to replace some existing synchronization for Chained Items, but we haven't done any implementation yet.
cachelib/allocator/Refcount.h
Outdated
| // An item is only moving when its refcount is one and only the exclusive | ||
| // bit among all the control bits is set. This indicates an item is already | ||
| // on its way out of cache and does not need to be moved. |
There was a problem hiding this comment.
Is this comment still true? Considering we also require this state to mark an item for eviction.
There was a problem hiding this comment.
Right, the last part is not true - I removed it. We should be able to extend this description in later PR (when we introduce Wait Context for moving items).
| if ((++nCASFailures % 4) == 0) { | ||
| // this pause takes up to 40 clock cycles on intel and the lock cmpxchgl | ||
| // above should take about 100 clock cycles. we pause once every 400 | ||
| // cycles or so if we are extremely unlucky. | ||
| folly::asm_volatile_pause(); | ||
| } |
There was a problem hiding this comment.
Not relevant to this diff but I'm wondering if this is still necessarily. The 40 clock cycles and 100 clock cycles were some measurement from >5 years ago.
It also seems unlikely for this to contend so heavily in typical usage.
There was a problem hiding this comment.
We actually had some discussion and did some measurements of this internally. I'll try to get the data and see if it makes sense to change this.
cachelib/allocator/Refcount.h
Outdated
| const bool alreadyExclusive = curValue & bitMask; | ||
| if (!flagSet || alreadyExclusive) { | ||
| auto predicate = [linkedBitMask, exclusiveBitMask](const Value curValue) { | ||
| const bool linked = curValue & linkedBitMask; |
There was a problem hiding this comment.
nitpick again: notLinked = !(curValue & linkedBitMask)
cachelib/allocator/Refcount.h
Outdated
| * An item can only be marked exclusive when `isInMMContainer` returns true | ||
| * and the item is not yet marked as exclusive. This operation is atomic. | ||
| * An item can only be marked for eviction when `isInMMContainer` and | ||
| * `isAccessible` return true and item does not have `kExclusive` bit set |
There was a problem hiding this comment.
is this still true? We don't check for isAccessible here.
Separately, curious if we can remove the condition isInMMContainer == true down the road?
There was a problem hiding this comment.
Right, forgot to update the comment. Done.
The requirement for isInMMContainer == true is mainly needed for SlabRelease. Since SlabRelease does not take MMContainer lock, checking whether item is still in the container is the only possible synchronization mechanism - I'm not sure we can change that.
It is similar to 'moving' but requires ref count to be 0. An item which is marked for eviction causes all incRef() calls to that item to fail. This will be used to ensure that once item is selected for eviction, no one can interfere and prevent the eviction from suceeding. 'markedForEviction' relies on the same 'exlusive' bit as the 'moving' state. To distinguish between those two states, 'moving' add 1 to the refCount. This is hidden from the user, so getRefCount() will not return that extra ref.
2460d1c to
b9b7db3
Compare
|
@igchor has updated the pull request. You must reimport the pull request before landing. |
|
@haowu14 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
@igchor We are ready for the next PR! Thanks for working on this. |
|
@haowu14 thanks! The next PR is coming soon! |
Summary: This is the next part of the 'critical section' patch after #183. Original PR (some of the changes already upstreamed): #172 This PR contains changes to findEviction only. The remaining part (changes in SlabRelease code) will be provided in the next (final) PR. Pull Request resolved: #217 Reviewed By: therealgymmy Differential Revision: D45071460 Pulled By: haowu14 fbshipit-source-id: 4d44d75182537369a50e3c1ebb10a7c844449875
It is similar to 'moving' but requires ref count to be 0.
An item which is marked for eviction causes all incRef() calls to that item to fail.
This will be used to ensure that once item is selected for eviction, no one can interfere and prevent the eviction from suceeding.
'markedForEviction' relies on the same 'exlusive' bit as the 'moving' state. To distinguish between those two states, 'moving' add 1 to the refCount. This is hidden from the user, so getRefCount() will not return that extra ref.